Chinese Named Entity Abbreviation Generation Using First-Order Logic
نویسندگان
چکیده
Normalizing named entity abbreviations to their standard forms is an important preprocessing task for question answering, entity retrieval, event detection, microblog processing, and many other applications. Along with the quick expansion of microblogs, this task has received more and more attentions in recent years. In this paper, we propose a novel entity abbreviation generation method using first-order logic to model long distance constraints. In order to reduce the human effort of manual annotating corpus, we also introduce an automatically training data construction method with simple strategies. Experimental results demonstrate that the proposed method achieves better performance than state-of-the-art approaches.
منابع مشابه
Vocabulary expansion through automatic abbreviation generation for Chinese voice search
Long named entities are often abbreviated in oral Chinese language, and this usually leads to out-of-vocabulary(OOV) problems in speech recognition applications. The generation of Chinese abbreviations is much more complex than English abbreviations, most of which are acronyms and truncations. In this paper, we propose a new method for automatically generating abbreviations for Chinese named en...
متن کاملChinese Named Entity Recognition Based on Hierarchical Hybrid Model
Chinese named entity recognition is a challenging, difficult, yet important task in natural language processing. This paper presents a novel approach based on a hierarchical hybrid model to recognize Chinese named entities. Three mutually dependent stages-boosting, Markov Logic Networks (MLNs) based recognition, and abbreviation detection are integrated in the model. AdaBoost algorithm is utili...
متن کاملA Preliminary Study on Probabilistic Models for Chinese Abbreviations
Chinese abbreviations are widely used in the modern Chinese texts. They are a special form of unknown words, including many named entities. This results in difficulty for correct Chinese processing. In this study, the Chinese abbreviation problem is regarded as an error recovery problem in which the suspect root words are the “errors” to be recovered from a set of candidates. Such a problem is ...
متن کاملChinese NER Using CRFs and Logic for the Fourth SIGHAN Bakeoff
We report a high-performance Chinese NER system that incorporates Conditional Random Fields (CRFs) and first-order logic for the fourth SIGHAN Chinese language processing bakeoff (SIGHAN-6). Using current state-of-theart CRFs along with a set of well-engineered features for Chinese NER as the base model, we consider distinct linguistic characteristics in Chinese named entities by introducing va...
متن کاملA Framework Based on Graphical Models with Logic for Chinese Named Entity Recognition
Chinese named entity recognition (NER) has recently been viewed as a classification or sequence labeling problem, and many approaches have been proposed. However, they tend to address this problem without considering linguistic information in Chinese NEs. We propose a new framework based on probabilistic graphical models with firstorder logic for Chinese NER. First, we use Conditional Random Fi...
متن کامل